Semi-Supervised Support Vector Machines for Unlabeled Data Classification

نویسنده

  • Glenn Fung
چکیده

A concave minimization approach is proposed for classifying unlabeled data based on the following ideas: (i) A small representative percentage (5% to 10%) of the unlabeled data is chosen by a clustering algorithm and given to an expert or oracle to label. (ii) A linear support vector machine is trained using the small labeled sample while simultaneously assigning the remaining bulk of the unlabeled dataset to one of two classes so as to maximize the margin (distance) between the two bounding planes that determine the separating plane midway between them. This latter problem is formulated as a concave minimization problem on a polyhedral set for which a stationary point is quickly obtained by solving a few (5 to 7) linear programs. Such stationary points turn out to be very effective as evidenced by our computational results which show that clustered concave minimization yields: (a) Test set improvement as high as 20.4% over a linear support vector machine trained on a correspondingly small but randomly chosen subset that is labeled by an expert. (b) Test set correctness averaged to within 5.1% when compared to that of a completely supervised linear support vector machine trained on the entire dataset which has been labeled by an expert.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?

Support Vector Machines present an interesting and effective approach to solve automated classification tasks. Although it only handles binary and supervised problems by nature, it has been transformed into multiclass and semi-supervised approaches in several works. A previous study on supervised and semi-supervised SVM classification over binary taxonomies showed how the latter clearly outperf...

متن کامل

Margin-based Semi-supervised Learning

In classification, semi-supervised learning occurs when a large amount of unlabeled data is available with only a small number of labeled data. In such a situation, how to enhance predictability of classification through unlabeled data is the focus. In this article, we introduce a novel large margin semi-supervised learning methodology, utilizing grouping information from unlabeled data, togeth...

متن کامل

A Comparison of Discriminative EM-Based Semi-Supervised Learning algorithms on Agreement/Disagreement classification

Recently, semi-supervised learning has been an active research topic in the natural language processing community, to save effort in hand-labeling for data-driven learning and to exploit a large amount of readily available unlabeled text. In this paper, we apply EM-based semi-supervised learning algorithms such as traditional EM, co-EM, and cross validation EM to the task of agreement/disagreem...

متن کامل

Large Margin Semi-supervised Learning

In classification, semi-supervised learning occurs when a large amount of unlabeled data is available with only a small number of labeled data. In such a situation, how to enhance predictability of classification through unlabeled data is the focus. In this article, we introduce a novel large margin semi-supervised learning methodology, using grouping information from unlabeled data, together w...

متن کامل

An Inexact Implementation of Smoothing Homotopy Method for Semi-Supervised Support Vector Machines

Semi-supervised Support Vector Machines is an appealing method for using unlabeled data in classification. Smoothing homotopy method is one of feasible method for solving semi-supervised support vector machines. In this paper, an inexact implementation of the smoothing homotopy method is considered. The numerical implementation is based on a truncated smoothing technique. By the new technique, ...

متن کامل

Sparse Quasi-Newton Optimization for Semi-supervised Support Vector Machines

In real-world scenarios, labeled data is often rare while unlabeled data can be obtained in huge quantities. A current research direction in machine learning is the concept of semi-supervised support vector machines. This type of binary classification approach aims at taking the additional information provided by unlabeled patterns into account to reveal more information about the structure of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001